Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LT-21988: Improve Hermit Crab performance #275

Open
wants to merge 14 commits into
base: release/9.3
Choose a base branch
from
Open

Conversation

jtmaxwell3
Copy link
Collaborator

@jtmaxwell3 jtmaxwell3 commented Feb 28, 2025

I improve Hermit Crab performance by incorporating a new version of SIL.Machine.HermitCrab which improves the performance of strata and by adding a Strata field to the HC Parser Parameters. The combination improved performance of parsing all of the words in the test project from ~3 hours to ~10 minutes if the Strata field is set to "Templates, [C]ɔ-". How to use the Strata field will have to be documented somewhere.


This change is Reviewable

Copy link
Contributor

@JakeOliver28 JakeOliver28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 14 files reviewed, 3 unresolved discussions


Src/LexText/ParserCore/HCLoader.cs line 354 at r1 (raw file):

				{
					// Save these classes for later.
					if (rule == "Clitics")

A switch statement would seem more natural here.


Src/LexText/ParserCore/HCLoader.cs line 434 at r1 (raw file):

			// Remove empty strata.
			foreach (Stratum stratum in m_language.Strata.ToList())

This foreach could be shortened to this:

Code snippet:

m_language.Strata.RemoveAll(stratum =>
        !stratum.Entries.Any() &&
        !stratum.AffixTemplates.Any() &&
        !stratum.MorphologicalRules.Any() &&
        !stratum.PhonologicalRules.Any());
        
        
// if that's not readable enough, we could also do this
var filterFunction = stratum =>
        !stratum.Entries.Any() &&
        !stratum.AffixTemplates.Any() &&
        !stratum.MorphologicalRules.Any() &&
        !stratum.PhonologicalRules.Any();
        
m_language.Strata.RemoveAll(filterFunction);

Src/LexText/ParserCore/HCLoader.cs line 455 at r1 (raw file):

		}

		bool MoveRule(string ruleName, Stratum source, Stratum target)

We could remove redundancy with a helper method here.

Code snippet:

private bool MoveRule(string ruleName, Stratum source, Stratum target)
{
    bool found = false;

    found |= MoveMatchingItems(source.Entries, target.Entries, entry => m_entryName[entry] == ruleName);
    found |= MoveMatchingItems(source.MorphologicalRules, target.MorphologicalRules, rule => rule.Name == ruleName);
    found |= MoveMatchingItems(source.PhonologicalRules, target.PhonologicalRules, rule => rule.Name == ruleName);
    found |= MoveMatchingItems(source.AffixTemplates, target.AffixTemplates, rule => rule.Name == ruleName);

    return found;
}

private bool MoveMatchingItems<T>(IList<T> source, IList<T> target, Func<T, bool> filterFunction)
{
    var itemsToMove = source.Where(filterFunction).ToList();
    if (itemsToMove.Count == 0) return false;

    foreach (var item in itemsToMove)
    {
        target.Add(item);
        source.Remove(item);
    }
    
    return true;
}

Copy link
Contributor

@JakeOliver28 JakeOliver28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 14 files reviewed, 3 unresolved discussions (waiting on @jtmaxwell3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants